fix: avoid unicode filepath suffix panic#393
Merged
dmtrKovalenko merged 3 commits intoApr 21, 2026
Merged
Conversation
dmtrKovalenko
approved these changes
Apr 19, 2026
Owner
|
@copilot resolve the merge conflicts in this pull request |
The previous fix (a09292e) only guarded path_ends_with_suffix with path.get(start..), but three problems remained: 1. path_ends_with_suffix: path_bytes[start - 1] reads inside a multi-byte char when start is a valid boundary but start-1 is not. Fixed by scanning backward to find the preceding ASCII byte. 2. path_contains_segment: path[..segment_len] and path[start..end] slice at non-char-boundary offsets when segment is ASCII but the path contains multi-byte UTF-8 (Korean, etc). Fixed with is_char_boundary() checks before each slice. 3. file_has_extension: same byte-offset issue for dot_pos. Fixed with is_char_boundary() check. Adds regression tests with the exact Korean filenames that caused panics (커리큘럼, 세부_커리큘럼_최종, 설치-및-기본-사용, etc). Merges upstream unicode tests (apostrophe, narrow-space mismatches).
91cd463 to
f067e82
Compare
dmtrKovalenko
approved these changes
Apr 21, 2026
tmdgusya
added a commit
to tmdgusya/roach-pi
that referenced
this pull request
Apr 22, 2026
0.6.0 crashes the fff-bg indexing thread on UTF-8 multibyte filenames (e.g. Korean, emoji) due to a non-char-boundary &str slice in path_ends_with_suffix() panicking across the FFI boundary. Fixed upstream in dmtrKovalenko/fff#393; pin nightly until a stable 0.6.2 is cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-actions Bot
pushed a commit
to tmdgusya/roach-pi
that referenced
this pull request
Apr 22, 2026
## [1.9.5](v1.9.4...v1.9.5) (2026-04-22) ### Bug Fixes * **fff:** pin @ff-labs/fff-node to 0.6.2-nightly.acd2f0c ([4470b66](4470b66)), closes [dmtrKovalenko/fff#393](dmtrKovalenko/fff#393) ### Miscellaneous * upgrade @ff-labs/fff-node 0.5.2 → 0.6.0 ([3b1ea47](3b1ea47))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
path_ends_with_suffix()falseinstead of panicking when a byte-derived suffix offset lands inside a multibyte characterapply_constraints(Constraint::FilePath(...))Similar PR / duplicate check
I checked existing PRs before opening this:
fix: Unicode segmentation crash)Constraint::FilePath/path_ends_with_suffix()panic pathThis PR is intentionally narrow: it fixes the unchecked
path[start..]slice incrates/fff-core/src/constraints.rswithout changing matching semantics.Root cause
path_ends_with_suffix()computed:and then sliced with:
startis a byte offset, not guaranteed to be a UTF-8 char boundary. For Unicode filenames, a non-matching suffix can makestartland inside a multibyte codepoint, which panics before constraint filtering can returnfalse.Fix
Use
path.get(start..)instead of unchecked indexing:startis not a valid char boundary, returnfalse/boundary behaviorVerification
I avoided using any user-specific filename in tests and instead used synthetic Unicode fixture names.
Reproduction guard added
New tests in
crates/fff-core/src/constraints.rs:test_path_ends_with_suffix_does_not_panic_on_unicode_suffixtest_apply_constraints_file_path_with_unicode_suffixtest_path_contains_segment_does_not_panic_on_unicode_segmentThe important regression case uses a synthetic filename such as:
data/유니코드_파일_테스트.csvand a deliberately non-matching suffix that would previously place the byte offset in the middle of a multibyte character.
Commands run
cargo test -p fff-search constraints::tests -- --nocaptureResult
All constraint tests pass locally after the fix, including the new Unicode regression coverage.
Scope notes